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Abstract 

There is an increasing demand for space robotic sys- 
tems which can reduce the number of potentially haz- 
ardous EVA’s on manned space missions. In addition, 
telerobotic maneuvers can easily become long and tire- 
some for the operator. This paper describes a robotic 
system which accepts motion and control commands 
which can be generated autonomously. 

The system developed has been designed to per- 
form an autonomous grapple based on guidance con- 
trol feedback provided by images from a single camera 
mounted on the slave robot’s end effector. The vision 
system consists of three parts. The first part is signa- 
ture based, trained on an arbitrary grapple interface 
(i.e. no special targets are required for guidance); it 
provides estimates for the 3D attitude of the interface 
by interpolating sampled signature correlations. These 
signatures are essentially the distribution of line ori- 
entations obtained by radial integration of the Fourier 
transform of a pre-processed edge image. The second 
part estimates the range and bearing of the interface 
based on the first and second moments of the prepro- 
cessed edge image of the interface. And the third stage 
of the algorithm verifies the results. 

The robot path follows a linear translation trajectory 
which is repeatedly adjusted for errors via the vision 
system. The end effector’s attitude is adjusted along 
the trajectory such that the grapple interface always 
remains in center view of the camera. 

Introduction 

Teleoperations are becoming increasingly important 
in hazardous environments (e.g. chemical plants, nu- 
clear power plants, space). Space systems applications, 
such as space-based assembly and maintenance, auto- 
matic rendezvous and docking, space exploration, and 
satellite monitoring and tracking 1 are of particular in- 
terest due to potentially long delay times between op- 
erator and robot. For instance, it has been estimated 
that robotic operations can take several times as long 
as extra-vehicular activity (EVA) to perform similar 
tasks 2 * 3 . Long delay times and limited bandwidth re- 
quire the robot to accept only high level commands 
and to possess locally a certain degree of autonomy. 


Object recognition and attitude determination of 
objects are essential components for successful sen- 
sor based teleoperational semi-autonomous robotic sys- 
tems. This paper will cover camera based systems, due 
to the relatively low cost of CCD cameras and their 
wide use in remote robotic systems. 

Current vision based robotic systems utilize visual 
guidance targets. These targets must be placed on ob- 
jects with which the robot is to interact 4 . However, 
when the objects are not readily accessible to humans, 
which is the case when operating in a hostile environ- 
ment such as space, the system restricts the class of 
robotic interactions to those which are specifically iden- 
tified and designed a priori. 

The new vision system developed eliminates the need 
for these guidance targets by allowing the object, or 
part of the object (i.e. a grapple fixture), to become the 
robot’s visual guidance target. This is accomplished 
by teaching the vision system the object by presenting 
different views. This training could be done with a 
physical object or by using a CAD model of the object. 

The complete description of a particular target rel- 
ative to the camera consists of six parameters: roll, 
pitch, yaw, range, and two bearing parameters. All six 
can be estimated, in principle, from a single camera 
image and knowledge of the target’s solid geometry. 

We have developed a new technique for determin- 
ing the three-dimensional roll, pitch, and yaw attitude 
target parameters and the three translation parameters 
assuming that the object is known and unoccluded. 

Method 

We restrict the class of images to those of machined 
objects, which characteristically produce sharp edge 
discontinuities. The edge discontinuities result from 
the projection onto the image plane of the polytopes, 
cylinders and conic sections comprising the object. Our 
approach relies on these projected edges as the basic 
features required to analyze and interpret the image 
data. 

Attitude Estimation 

The technique for estimating the attitude relies on 
extracting a signature of the object as viewed by the 
camera, and then matching it against signatures of the 
same object with known attitudes, generated off line 


Copyright ©1993 by the American Institute of Aeronautics 
and Astronautics, Inc. All rights reserved. 


563 


from a model of that object. The attitude estimate is 
obtained by interpolating among the signatures with 
the highest matching scores. The overall procedure is 
diagramed in Figures 1. 



Figure 1: Overall data flow diagram for the estimation 
of the attitude parameters. 

The algorithm computes as a signature the distri- 
bution of edge segments in the image as a function of 
orientation in the image plane. When an object un- 
dergoes an attitude transformation, the distribution of 
line orientations in the image plane changes; therefore, 
the signature contains implicit information about the 
object’s attitude. On the other hand, the signature is 
insensitive to the range and bearing of the object, as 
these do not affect the distribution of line orientations 
in the image. The signature matching procedure ap- 
proximates the inverse map from line orientations to 
object attitude. 

The signature extraction computation involves three 
steps. First, a binary line image is obtained from the 
original picture (Fig. 2), reducing the effect of changes 
in illumination of the target (Fig. 3). The prepro- 
cessing requests identification of the object within the 
field of view, and removal of clutter in the image. We 
achieve this by an image segmentation strategy dis- 
cussed in detail in the Appendix. The line image is 
then mapped into the two-dimensional Fourier domain, 
effectively collapsing range and bearing information, 
while preserving information on the object’s roll, pitch, 
and yaw (Fig. 4). Lastly, a weighted sum of the magni- 
tude in the Fourier image yields the distribution of line 
segments as a function of orientation, which serves as 
an attitude signature (see Gonzales and Wintz 5 for an 
introductory discussion on the properties of the Fourier 
Transform applied to image processing)(Fig. 5). 

In particular, the Fourier transform provides an effi- 
cient and robust means of extracting the signature. In 
essence, any straight line in the image plane is mapped 


by the Fourier transformation into a straight line pass- 
ing through the origin of the transform domain, and 
orthogonal to the original line. The distance from the 
origin of the original line results in a complex phase 
modulation of its transform. By linearity of the Fourier 
mapping, an image consisting of several straight lines 
is transformed into a superposition of lines emanat- 
ing from the origin. Thus, a radial integration of the 
Fourier transform’s magnitude, about the origin of the 
transform domain, yields the desired signature. To 
compensate for the finite thickness and length of ac- 
tual line segments in the image, the Fourier transform 
is radially weighted, to deemphasize edge thickness. 

The attitude parameters are found by performing a 
cyclic cross-correlation of the target signature with the 
library signatures and selecting the maximally corre- 
lated match. The best signature picked reflects the ob- 
ject pitch and yaw. The offset of that signature match 
reflects the roll. Since signatures are 180° symmetri- 
cal there is a 180° ambiguity in the roll measurement. 
This ambiguity will be resolved in the match verifica- 
tion process described in Section 2.3. 

Position Estimation 

The technique for estimating the range and the two 
bearing parameters of the object relies on the center of 
gravity x c , y c , and the sum of the variances along 
the x-axis <r\ and the y-axis cr^ of the object’s edge 
image: 

< T l = vl + (i) 

The range of the object is determined using cr^. It can 
be shown, that is invariant to rotation and transla- 
tion of the image 6-8 . Using a perspective projection, 
and assuming that the size of the object is small com- 
pared to the range, the distance of the object in the 
actual image, zq , is given by, 

^ _ f° x ^ref x ^ref /o\ 

Z 0 = r (^) 

/ref x 

where <j 0 is the square root of the variance of the actual 
image, / 0 is the focal length of the lens used, <r re f is 
the square root of the variance in the edge image of the 
matching signature, / re f is the focal length of the lens 
used in generating the signature library, and z re f is the 
range of the object used during training. 

By knowing the deviation of the center of gravity of 
the actual edge image against the center of gravity of 
the edge image of the matched library signature, the 
two bearing components are determined by: 

p — tan -1 — tan -1 (3) 

Jo f re f 

<j> = tan -1 ~ — tan -1 (4) 

Jo /re f 

where (ari,^) and (x re j , y re /) are the center of gravi- 
ties of the actual edge image and the training edge im- 
age respectively, p and <j> are the values of the bearing 
parameters along the y-axis and x-axis respectively. 


564 




Figure 2: Synthetic image of the Micro Interface de- 
vice, a typical machined object. Figure 4: The weighted 2D FFT transform of the edge 

image of the Micro Interface Device. 



gridCE Signature 



Figure 5: The extracted signature, encoding the dis- 
tribution of line edge orientations in the original image 
Figure 3: The edge image for the Micro Interface de- of the Micro Interface device, 
vice. 
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Model Based Attitude Estimation and Verification 
The six attitude parameters found in Sections 2.1 
and 2.2 must be verified and the ambiguity of the 
roll must be resolved. This is accomplished with the 
help of a perspective projection (overlay) of a three- 
dimensional model of the target (Figure 6) in a cross 
correlation with the edge image of the object seen by 
the camera. The overlay with the highest correlation 
yields the best estimate of the attitude and position of 
the target. 



Figure 6: Data flow diagram for estimating the pose 
of an object based on the projection of a three- 
dimensional model of the target 

The six pose parameters of the n-best matches from 
the signature based algorithm in Section 2.1 were used 
to generate n corresponding overlays. A typical overlay 
is shown in Fig. 7. Those overlays were matched 

The three-dimensional model of the object was de- 
fined in terms of polygons where each polygon was de- 
rived by its vertices. To each polygon a surface normal 
was assigned to calculate the visibility of the polygon 
for the current attitude of the object. The visibility 
check was achieved by determining the sign of the dot 
product between the normal vector and a vector ex- 
tending from the the polygon to the view point. For 
positive values the polygon was visible and for negative 
values invisible. 

To increase the robustness and precision of the cross 
correlation we correlate the directions of the edges 
with the direction of the overlay edges. By looking 
at the directional image gradient we obtain not only 
the strength of the edge but also its direction. The 



Figure 7: Example of an overlay which was used in a 
cross correlation to estimate the pose of a target (The 
jagged edges are caused by the typ-setting process). 

modified cross correlation can be stated as 
N * N v ^ 

match(i y j) = ££|< ptim (*.y) pt ov (» + x,i + y))| 

x=0 y =0 

( 5 ) 

where (•) denotes the dot product between two vectors. 
The vectors pti m (x y y) and pt ov ( x y y ) are defined as 
the two-dimensional vector d f rom the 

camera image and overlay image respectively. It has to 
be noted that in Equation 5 we only have to perform 
the cross correlation in the vicinity of the projection 
of the object model because the signature based al- 
gorithm gives reliable estimates of the position of the 
target. 

The combination of the signature based method 
shown in Section 2.1 and the above approach based 
on cross correlation allows us to overcome one of the 
main disadvantages of the model based methods shown 
in the literature 9-10 where a correspondence had to be 
established between image features and model features 
to solve for the attitude parameters. With the signa- 
ture based algorithm we are able to prune down the 
search tree of possible aspects of the model and reduce 
the range of the cross correlation considerably. 

Robot Control 

The algorithm for moving the robot towards the tar- 
get to perform a grapple is described below: 

• Using a camera mounted on the end effector, esti- 
mate the position and attitude of the target (i.e 
the handle to be grappled) with respect to the 
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camera. We will call this frame H where the term 
frame refers to both attitude and position. 

• We can define a frame, Cfh, with respect to the 
target, which is the desired final frame of the cam- 
era for the approach. Now we can use our estimate 
for the target to come up with an estimate for Cf, 
called Cf, with respect to the Camera frame C. 
If C is within tolerable limits of Cf then we are 
at the final position and attitude and can grapple 
the object. 

• If we have not reached Cf then we can calculate 
our next desired camera position by using the fol- 
lowing constraints on the next camera frame, de- 
noted by N(C). 

1. The origin of N(C) ) denoted by N(C)o = 
N(Cq), falls on the line CoCfo- 

2. N(C) should be at most a distance of dmax 
from C. 

3. The 2- axis of N(C ), denoted by N(C Z ) 
should point towards Ho. 

4. N(C X ) should be perpendicular to both 
N(C Z ) (of course) and H y . In particular the 
sign of the vector is defined by N(C X ) = 

H y x N(C Z ). 

• We can calculate N(G) with respect to G from 
N(C), since the relationship of the camera frame 
C to the end effector frame G is known. This 
information can be put in the form of relative 
(x, y, z, R , P, Y) moves. 

• Command the robot to make the relative move 
calculated above. 

• Repeat the entire process. 


Results 

We have tested the algorithm on a set of synthetic 
images of an interface device used in space system ap- 
plications (Fig. 2). The Micro Interface device is used 
in SSF robotic operations. A ray-tracer was used to 
generate the synthetic images’ aspect transformations 
of the target with respect to the image plane. Although 
the results presented in sections 3. 1-3.4 were generated 
with synthetic images, similar results have been ob- 
tained for real camera images. 

A 5 x 5 signature library was generated from syn- 
thetic images to cover a square patch 10° on the side 
in the pitch-yaw plane with an inter-signature sepa- 
ration of 2.5° in each direction. The center orienta- 
tion was selected to correspond to a typical view of 
the Micro Interface during a grasping operation. This 
signature library was representative of more realistic 


libraries covering a larger range of pitch and yaw pa- 
rameter values. 

Using this signature library, four tests were per- 
formed: 

1. Random roll, pitch and yaw attitude estimation. 

2. Bearing and Range estimation. 

3. Range invariance test of roll, pitch and yaw. 

4. Bearing invariance test of roll, pitch and yaw. 

For consistency with the ray-tracer program, the tar- 
get’s attitude in all four tests was represented using 
three Euler angles, which measure attitude through a 
set of three rotations about the z, x, and again z axes, 
in the camera’s frame of reference (the image plane 
coincides with the xy-plane, and faces the negative z 
axis). We denote these three rotation angles by a, /?, 
and 7, respectively. The translation components, range 
and bearing, of the image plane around the x-axis and 
y-axis were denoted with z, <£, and p respectively. 

The tests provide evidence for the viability of the 
approach to 3D attitude and position determination. 
The procedure accurately estimates the position and 
the three attitude parameters of the object. The al- 
gorithm shows invariance to the range and bearing of 
the target for the estimation of the 3D attitude. These 
test results are described in the sections 3. 1-3.4. 

Roll, Pitch, and Yaw Estimation 

A random set of 10 target images with three arbi- 
trary Euler angles was used to test the algorithm’s abil- 
ity to correctly determine the target’s attitude. The 
exact and estimated Euler angles are shown in Table 1. 

The average error in any one parameter is 0.6°. The 
maximum error occurred for the a parameter of Im- 
age J, a difference of 2.7°. For this image, the wrong 
library signature was selected in the matching stage. 
The difference in the y parameter partially compen- 
sates for this error, reducing the combined a + 7 angu- 
lar error for this image to only 1.2°. 

Bearing and Range Estimation 

A set of 4 target images was used to test the accuracy 
of the procedure for the bearing parameter and a set of 
6 target images was used to test the accuracy for the 
range. The exact and estimated parameters for range 
and bearing are shown in Table 2 and Table 3. The 
average error is 2.2cm for the range estimate and 0.1° 
for the estimate of the bearing. 

Range Invariance 

A set of 14 target images was used to test the al- 
gorithms sensitivity to the target’s variation in range. 
The range of the target in the training images, used 
to generate the signature library, was 30cm from the 
image plane. The exact and estimated Euler angles are 
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Table 1: Attitude estimation test results. 


Image 

Exact Angles 
(degrees) 
a /? 7 

Estimated Angles 
(degrees) 
a /? 7 

A 

16.09 

22.63 

-8.55 

16.57 

22.15 

-9.98 

B 

15.24 

20.46 

2.63 

15.60 

20.33 

2.81 

C 

18.39 

23.27 

7.69 

19.29 

22.44 

6.19 

D 

18.40 

22.08 

-4.55 

17.91 

21.89 

-3.65 

E 

19.67 

23.51 

-1.27 

20.57 

23.11 

-1.55 

F 

16.92 

24.55 

5.33 

16.90 

24.36 

4.78 

G 

17.60 

23.81 

-0.45 

17.86 

23.51 

-0.14 

H 

19.15 

21.31 

-5.24 

17.95 

21.38 

-3.51 

I 

15.17 

20.24 

-4.50 

15.32 

20.01 

-4.22 

J 

15.27 

23.68 

-2.81 

12.50 

24.74 

-0.70 


Table 2: Range estimation test results. 


Image 

Range 

(cm) 

Estimated Range 
(cm) 

A 

31 

31.17 

B 

35 

35.70 

c 

39 

40.00 

D 

45 

46.30 

E 

60 

62.10 

F 

90 

98.00 


shown in Table 4, for various target ranges. Angles are 
measured in degrees, range in centimeters. 

The errors incurred are moderate, and degrade as 
the range increases. The maximum error occurred for 
the a parameter of Image L, a difference of 5.0°. For 
this image, the wrong library signature was selected in 
the matching stage. The difference in the 7 parame- 
ter partially compensates for this error, reducing the 
combined a + 7 angular error for this image to only 
1.5°. 


Table 4: Range invariance test results. 


Image 

Range 

(cm) 

Exact Euler Angles 
(degrees) 

P 7 

* 

30 

17.500 

22.500 

0.000 

Image 

Range 

(cm) 

Estimated Euler Angles 
(degrees) 

a P 7 

A 

30 

17.553 

22.342 

0.000 

B 

31 

17.455 

22.199 

0.000 

C 

32 

17.514 

22.193 

0.000 

D 

33 

17.463 

22.116 

0.000 

E 

34 

17.467 

21.989 

0.000 

F 

35 

17.436 

21.827 

0.000 

G 

36 

17.468 

21.533 

0.000 

H 

37 

17.492 

21.529 

0.141 

I 

38 

17.412 

21.257 

0.141 

J 

39 

17.521 

21.351 

0.000 

K 

45 

18.068 

20.050 

0.140 

L 

60 

22.500 

17.500 

-3.518 

M 

90 

17.989 

20.885 

0.140 

N 

150 

17.776 

21.297 

0.140 


Table 3: Bearing estimation test results. 


Image 

Bearing 

(degrees) 

Estimated Bearing 
(degrees) 

A 

1 

0 

1.00 

0.02 

B 

2 

0 

2.00 

0.05 

C 

3 

0 

3.07 

0.13 

D 

4 

0 

4.14 

0.18 


Bearing Invariance 

A set of 5 target images was used to test the algo- 
rithm’s sensitivity to the target’s variation in bearing. 
The bearing of the target in the training images used 
to generate the signature library was 0° from the image 
plane’s normal. The exact and estimated Euler angles 
are shown in Table 5, for various target bearings, away 
from the image plane’s normal, in the direction of the 
positive y-axis. Both Euler angles and bearings are 
measured in degrees. 

The errors incurred are moderate, with a maximum 
error in the f3 parameter of Image E, a difference of only 
0.4° . Bearings of more than 4° would have brought the 
target partially outside the field of view of the camera, 
and were not tested. 
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Table 5: Bearing invariance test results. 


Image 

Bearing 

(degrees) 

Exact Euler Angles 
(degrees) 

oc 0 7 

* 

0.0 

17.500 

22.500 

0.000 

Image 

Bearing 

(degrees) 

Estimated Euler Angles 
(degrees) 

<X p 7 

A 

0.0 

17.553 

22.342 

0.000 

B 

1.0 

17.435 

22.510 

0.000 

C 

2.0 

17.472 

22.549 

0.000 

D 

3.0 

17.385 

22.735 

0.000 

E 

4.0 

17.337 

22.939 

0.000 


Verification Method Results 


marks, an existing implementation of the algorithm can 
be quickly adapted to a different object, by supplying 
a signature library for the new target. Moreover it is 
not necessary to possess a physical model of the object 
because it is possible to generate the signature library 
with a ray-tracer program . In addition the signatures 
require only IK bytes of memory each. Thus for a typ- 
ical signature library of 225 signatures, the signature 
library is smaller than one 512 by 512 image. 

The algorithm relies on standard image processing 
routines (e.g. edge extraction, 2D Fourier Transforma- 
tion), which are available in numerous image processing 
libraries, and fast hardware implementations. 

The 2D Fourier Transformation, which is the most 
time consuming part of the procedure is readily paral- 
lelized, so that a real time version of the algorithm can 
be achieved by distributed hardware. 

Although this method was developed under the as- 
sumption that there is no clutter in the image and that 
the target is of a known type, these constraints can be 
lifted using additional initial scene analysis. For ex- 
ample, the scene can be segmented using standard im- 
age processing algorithms, and potential objects can 
be compared to known objects via signature matching 
and match verification. 


Pair? of known and estimated orientations 



Figure 8: Matching random test points to best overlay 
candidate (candidates are on a 5 degree spaced grid) 

Figure 8 shows the overlay matching results. Each 
random test point is marked with ”+” and has a corre- 
sponding (correspondence is indicated by a connecting 
line) estimate denoted by ” o” . The estimates in this 
example fall on a 5° x 5° grid. As seen by the figure, 
all but one of the matches fell on the nearest grid point 
(i.e. the estimates were within 5 degrees). 

Conclusion 

A procedure has been developed to determine the 3D 
attitude and the position of machined objects without 
the use of any special marks. Since there is no need for 
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Appendix 

In order to remove background clutter we use infor- 
mation about the constraints of our scene with respect 
to our target. This appendix discusses in detail the im- 
age preprocessing techniques which we used in connec- 
tion with the robotic grappling application discussed 
in this paper. 

Preprocessing the Raw Image 

We start our processing given a single frame 256 gray 
scale image, 7. The image 7 is filtered three times 
producing three more useful images. The first is a low 
pass filtered version of the raw image, denoted by 7. 
The next two filtered images are the x and y gradients 
of the raw image, denoted as V x 7 and V y 7 respectively. 
Two binary edge images are then constructed using the 
above filtered images. 

The first edge image is a thin edge image, E, found 

by, 

U (f+l/2) 
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where ”>” is treated as pixel- wise binary output op- 
erator. The above equation attempts to enhance lo- 
cal edge information by dividing the magnitude of the 
gradient of the raw image by the average neighbor- 
hood pixel intensity. Thus small intensity variations in 
dark regions could be equivalent to larger variations in 
lighter regions 11 . 

The second edge image is a thick edge image, £+, 
defined as, 


E+ = 


( 


yAV^ + lVy /) 2 
(/ + 1 / 2 ) 



( 7 ) 


The image E + is nearly identical to E except that 
the threshold used is lower. Thus, E+ contains more 
white pixels (pixels which satisfy the binary condition). 
Therefore, E C E + (i.e. every white pixel of E is 
a white pixel in E+). Note that while E provides a 
cleaner edge image, E+ preserves the connectivity of 
the edge image. This connectivity will be used below 
to determine a processing region which rejects back- 
ground clutter. 


Rejecting Background Clutter 


In the discussed application we are interested in find- 
ing a handle which is mounted on a predominantly 
lighter background. In addition we assume that the 
handle structure will be larger than any unwanted clut- 
ter on the same background. Thus, we look for the 
largest edge structure in a dark region which is con- 
tained in a lighter region. This region tells us which 
information in E should be processed and which infor- 
mation should be rejected. Next we must determine 
what is light and what is dark as well as what is con- 
sidered an edge structure. 

Using the original raw image, 7, we generate a his- 
togram. This gray level histogram is then clustered into 
three fuzzy classes, dark pixels, medium pixels , and light 
pixels by using a fuzzy c-means clustering algorithm 12 . 

The light regions of the raw image are found us- 
ing the mid-point between the dark and medium pixels 
cluster centroids as a image threshold. This thresh- 
olded image is then segmented into blobs based on 
the pixels 8- connectivity 5 > 13 . The connectivity analy- 
ses only reports significant blobs (blobs which contain 
a significant number of pixels). Out of all the signif- 
icant blobs found, the algorithm picks the one with 
the largest area (number of pixels) as the largest light 
region. 

Next, the raw image is thresholded by the mid-point 
between the light and medium pixel cluster centroids. 
This time, all pixels below the threshold are considered 
logical 1 and all above are logical 0. This new binary 
image is combined with E + using a logical pixel-wise 
and . The resulting binary image contains edge struc- 
ture in the dark regions of the raw image. 


The edge structure is applied to the connectivity 
analyses algorithm to find all significant connected edge 
structures in dark regions of the raw image. The re- 
sulting processing region is then determined to be the 
largest edge structure in a dark region which is within 
the largest light region. If no processing region is found, 
then the largest edge structure in a dark region becomes 
the processing region. And if there where no signifi- 
cant edge structures in a dark region found a warning 
message is issued and the entire image is used as a 
processing region. 
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